Molecular Ecology Resources
○ Wiley
All preprints, ranked by how well they match Molecular Ecology Resources's content profile, based on 161 papers previously published here. The average preprint has a 0.06% match score for this journal, so anything above that is already an above-average fit. Older preprints may already have been published elsewhere.
Elbrecht, V.; Bourlat, S. J.; Hoerren, T.; Lindner, A.; Mordente, A.; Noll, N. W.; Sorg, M.; Zizka, V. M. A.
Show abstract
O_LISmall and rare specimens can remain undetected when metabarcoding bulk samples with a high size heterogeneity of specimens. This is especially critical for malaise trap samples, where most of the biodiversity is often contributed by small specimens. How to size sort and in which proportions to pool these samples has not been widely explored. We set out to find a size sorting strategy that maximizes taxonomic recovery but remains highly scalable and time efficient. C_LIO_LIThree 3 malaise trap samples where size sorted into 4 size classes using dry sieving. Each fraction was homogenized and lysed. The corresponding lysates were pooled to simulate samples never sorted, pooled in equal proportions and in 4 different proportions favoring the small size fractions. DNA from the pooled fractions as well as the individual size classes were extracted and metabarcoded using the FwhF2 and Fol-degen-rev primer set. Additionally wet sieving strategies were explored. C_LIO_LIThe small size fractions harbored the highest diversity, and were best represented when pooling in favor of small specimens. Not size sorting a sample leads to a 45-77% decrease in taxon recovery compared to size sorted samples. A size separation into only 2 fractions (below 4 mm and above) can already double taxon recovery compared to not sorting. However, increasing the sequencing depth 3-4 fold can also increase taxon recovery to comparable levels, but remains biased toward biomass rich taxa in the sample. C_LIO_LIWe demonstrate that size fractionizing bulk malaise samples can increase taxon recovery. The most practical approach is wet sieving into two size fractions, and proportional pooling of the lysates in favor of the small size fraction (80-90% volume). However, in large projects with time constraints, increasing sequencing depth can also be an alternative solution. C_LI
de Flamingh, A.; Ishida, Y.; Pecnerova, P.; Vilchis, S.; Siegismund, H.; van Aarde, R.; Malhi, R.; Roca, A.
Show abstract
Non-invasive biological samples benefit studies that investigate rare, elusive, endangered, and/or dangerous species. Integrating genomic techniques that use non-invasive biological samples with advances in computational approaches can benefit and inform wildlife conservation and management. Here we present a molecular pipeline that uses non-invasive fecal DNA samples to generate low- to medium-coverage genomes (e.g., >90% of the complete nuclear genome at 6X coverage) and metagenomic sequences, combining in a novel fashion widely available and accessible DNA collection cards with commonly used DNA extraction and library building approaches. DNA preservation cards are easy to transport and can be stored non-refrigerated, avoiding cumbersome and/or costly sample methods. The genomic library construction and shotgun sequencing approach did not require enrichment or targeted DNA amplification. The utility and potential of the data generated by this pipeline was demonstrated by the application of genome-scale analysis and metagenomics to zoo and free-ranging African savanna elephants (Loxodonta africana). Fecal samples collected from free-ranging individuals contained an average of 12.41% (5.54-21.65%) endogenous elephant DNA. Clustering of these elephants with others from the same geographic region was demonstrated by a principal component analysis of genetic variation using nuclear genome-wide SNPs. Metagenomic analyses generated compositional taxon classifications that included Loxodonta, green plants, fungi, arthropods, bacteria, viruses and archaea, showcasing the utility of our approach for addressing complementary questions based on host-associated DNA, e.g., pathogen and parasite identification. The molecular pipeline presented here extends applications beyond what has previously been shown for target-enriched datasets and contributes towards the expansion and application of genomic techniques to conservation science and practice.
Rancilhac, L.; Sylvestre, F.; Hutter, C. R.; Arntzen, J. W.; Babik, W.; Crochet, P.-A.; Deso, G.; Duguet, R.; Galan, P.; Pabijan, M.; Policain, M.; Priol, P.; Sabino-Pinto, J.; Capstick, M.; Elmer, K. R.; Dufresnes, C.; Vences, M.
Show abstract
Restriction site-Associated DNA sequencing (RADseq) has great potential for genome-wide systematics studies of non-model organisms. However, accurately assembling RADseq reads into orthologous loci remains a major challenge in the absence of a reference genome. Traditional assembly pipelines cluster putative orthologous sequences based on a user-defined clustering threshold. Because improper clustering of orthologs is expected to affect results in downstream analyses, it is crucial to design pipelines for empirically optimizing the clustering threshold. While this issue has been largely discussed from a population genomics perspective, it remains understudied in the context of phylogenomics and coalescent species delimitation. To address this issue, we generated RADseq assemblies of representatives of the amphibian genera Discoglossus, Rana, Lissotriton and Triturus using a wide range of clustering thresholds. Particularly, we studied the effects of the intra-sample Clustering Threshold (iCT) and between-sample Clustering Threshold (bCT) separately, as both are expected to differ in multi-species data sets. The obtained assemblies were used for downstream inference of concatenation-based phylogenies, and multi-species coalescent species trees and species delimitation. The results were evaluated in the light of a reference genome-wide phylogeny calculated from newly generated Hybrid-Enrichment markers, as well as extensive background knowledge on the species systematics. Overall, our analyses show that the inferred topologies and their resolution are resilient to changes of the iCT and bCT, regardless of the analytical method employed. Except for some extreme clustering thresholds, all assemblies yielded identical, well-supported inter-species relationships that were mostly congruent with those inferred from the reference Hybrid-Enrichment data set. Similarly, coalescent species delimitation was consistent among similarity threshold values. However, we identified a strong effect of the bCT on the branch lengths of concatenation and species trees, with higher bCTs yielding trees with shorter branches, which might be a pitfall for downstream inferences of evolutionary rates. Our results suggest that the choice of assembly parameters for RADseq data in the context of shallow phylogenomics might be less challenging than previously thought. Finally, we propose a pipeline for empirical optimization of the iCT and bCT, implemented in optiRADCT, a series of scripts readily usable for future RADseq studies.
Iwaszkiewicz-Eggebrecht, E.; Granqvist, E.; Nowak, K. H.; Valdivia, C.; Buczek, M.; Srivathsan, A.; Hartop, E.; Miraldo, A.; Roslin, T.; Tack, A. J. M.; Lukasik, P.; Meier, R.; Ronquist, F.
Show abstract
1. DNA metabarcoding--high-throughput sequencing of barcode regions from bulk samples--has become a key tool for insect biodiversity assessment. Yet, how methodological choices affect the accuracy of metabarcoding data remains insufficiently explored. In this paper, we ask: (1) How does the lysis method (non-destructive lysis vs. destructive homogenization) affect community recovery? (2) How comprehensively does metabarcoding capture species richness? (3) To what extent can spike-ins improve abundance estimates? (4) How accurately can species abundances be estimated? 2. We evaluated the accuracy of insect metabarcoding using 4,749 bulk samples from a large-scale biodiversity survey subjected to mild lysis. Of these samples, 856 were also homogenized, allowing a systematic comparison of the effect of alternative treatments. To potentially improve abundance estimates, we added six biological spike-ins (i.e., foreign insects) to all samples, and two synthetic spike-ins (artificial DNA fragments) to the homogenization treatment. In addition, we established the contents of 15 samples by individually barcoding all specimens, enabling direct assessment of occurrence and abundance estimates. 3. Our results revealed consistent differences between destructive and non-destructive treatments. While both methods reliably detected the majority of species, small and soft-bodied taxa were more often recovered after mild lysis than after homogenization, while the reverse was true for heavily sclerotized, hairy, and large taxa. Using biological spike-ins for calibration reduced the variance in read numbers per specimen considerably, especially in homogenized samples, while synthetic spike-ins were less effective. In a Bayesian analysis, where species data were matched to the best-fitting spike-in calibration curve, accurate abundance estimates (+/-1 individual) were obtained for 72.9% of species occurrences. 4. Our results show that it is possible to obtain reasonably accurate abundance estimates from metabarcoding data, and that mild lysis and homogenization result in different taxon-specific biases in terms of occurrence data, with neither method outperforming the other. Accuracy is improved by homogenization rather than mild lysis of samples, and by the use of biological rather than synthetic spike-ins. Together, these findings provide a major step towards robust, quantitative biodiversity monitoring using DNA-metabarcoding.
Armstrong, E. E.; Li, C.; Campana, M. A.; Ferrari, T.; Kelley, J. L.; Petrov, D.; Solari, K. A.; Mooney, J. A.
Show abstract
Despite substantial reductions in the cost of sequencing over the last decade, genetic panels remain relevant due to their cost-effectiveness and flexibility across a variety of sample types. In particular, single nucleotide polymorphism (SNP) panels are increasingly favored for conservation applications. SNP panels are often used because of their adaptability, effectiveness with low-quality samples, and cost-efficiency for use in population monitoring and forensics. However, the selection of diagnostic SNPs for population assignment and individual identification can be challenging. The consequences of poor SNP selection are under-powered panels, inaccurate results, and monetary loss. Here, we develop a novel user-friendly SNP selection pipeline for population assignment and individual identification, mPCRselect. mPCRselect allows any researcher, who has sufficient SNP-level data, to design a successful and cost-effective SNP panel for species of conservation concern.
Gautier, M.; Coronado-Zamora, M.; Vitalis, R.
Show abstract
Introduced over seventy years ago, F -statistics have been and remain central to population and evolutionary genetics. Among them, FST is one of the most commonly used descriptive statistics in empirical studies, notably to characterize the structure of genetic polymorphisms within and between populations, to shed light on the evolutionary history of populations, or to identify marker loci under differential selection for adaptive traits. However, the use of FST in simplified population models can overlook important hierarchical structures, such as geographic or temporal subdivisions, potentially leading to misleading interpretations and increasing false positives in genome scans for adaptive differentiation. Hierarchical F -statistics have been introduced to account for multiple predefined levels of population structure. Several estimators have also been proposed, including robust ones implemented in the popular R package hierfstat. Nevertheless, these were primarily designed for individual genotyping data and can be computationally intensive for large genomic datasets. In this study, we extend previous work by developing unbiased method-of-moments estimators for hierarchical F -statistics tailored for Pool-Seq data, a cost-effective alternative to individual genome sequencing. These Pool-Seq estimators have been developed in an anova framework, using definitions based on identity-in-state probabilities. The new estimators have been implemented in an updated version of the R package poolfstat, together with estimators for sample allele count data derived from individual genotyping data. We validate and compare the performance of these estimators through extensive simulations under a hierarchical island model. Finally, we apply these estimators to real Pool-Seq data from Drosophila melanogaster populations, demonstrating their usefulness in revealing population structure and identifying loci with high differentiation within or between groups of subpopulations and associated with spatial or temporal genetic variation.
Ollivier, M.; Marquisseau, A.; Dufrene, E.; Rudelle, R.; The CODABEILLES Consortium, ; Rougerie, R.; Perrard, A.; Pichon, M.
Show abstract
In the Anthropocene, the decline of insect pollinators poses a significant threat to ecosystem services, particularly to wild bee populations essential for plant biodiversity and agricultural productivity. France, with 983 species, hosts one of the most diverse bee faunas in Europe, yet these species face growing pressures from habitat loss, climate change, and intensive agriculture. Addressing this crisis requires robust taxonomic frameworks and efficient species identification methods to support long-term monitoring initiatives such as the European Pollinator Monitoring Scheme, EU-PoMS. DNA barcoding, utilizing the COI-5P gene, has proven effective for species delineation and biodiversity monitoring, particularly in detecting cryptic diversity among genera with large numbers of species such as Andrena, Nomada or Lasioglossum. However, significant gaps remain in reference libraries, particularly for the species from the Mediterranean Basin. To bridge this gap, the CODABEILLES initiative was launched in 2021 to enhance barcode data for the French bee fauna. Initially, only 25% of species had barcodes from French voucher specimens, increasing to 62% when considering voucher specimens from other countries. By 2025, thanks to collaboration with sixteen specialists and institutions, CODABEILLES contributed 1477 reference barcodes, covering approximately 560 species and raising barcode coverage to 82%. When integrating data published under other initiatives over the same period the coverage reaches 94% of the French bee fauna. This dataset significantly enhances species identification accuracy and supports large-scale pollinator monitoring through metabarcoding and environmental DNA approaches. Despite the success of COI-5P barcoding, taxonomic inconsistencies persist, necessitating further integrative research. This study underscores the need for continued collaboration among taxonomists, molecular biologists, and conservationists to refine species classifications and ensure comprehensive reference databases. The improved barcode coverage provided by CODABEILLES paves the way for more accurate DNA-based monitoring of wild bee populations and their ecological interactions, crucial for guiding conservation strategies in the face of ongoing environmental change.
Brandao-Dias, P. F.; Guri, G.; Shaffer, M.; Allan, E. A.; Kelly, R. P.
Show abstract
Environmental DNA (eDNA) metabarcoding provides powerful insights into species presence and community composition, but remains limited in its ability to quantify species abundance or structure. Here, we show that deviation between observed haplotype frequencies within a given sample and the population haplotype frequencies can be used to infer the number of individual contributors to an eDNA sample. We also lay out the theory for how population haplotype frequencies can be approximated from eDNA data alone, enabling broad applicability even in the absence of tissue-based references. We then present an estimator to derive the number of individual contributors to a given eDNA sample and validate its performance using simulations with variable allele frequencies and noise. Our framework demonstrates that differences between expected and observed frequencies carry meaningful biological information in eDNA data. Our results show that the number of contributors can be recovered under a range of conditions, particularly with hypervariable markers and sufficient sampling. This approach complements existing molecular methods and opens a new avenue for inferring abundance from eDNA metabarcoding datasets.
Kuijk, J.; van den Burg, M.; Didaskalou, E.; de Boer, M.; Debrot, A.; Wielstra, B.; Stewart, K. A.
Show abstract
Reptiles have among the highest extinction risk across terrestrial vertebrates, with habitat fragmentation, habitat destruction, and invasive alien species being the primary causes of reptile species loss on a global scale. Invasive hybridization (i.e. hybridization between native and invasive alien species) is increasing globally, causing the extinction of native genotypes, and this phenomenon is particularly pervasive in Caribbean iguanas. The Lesser Antillean Iguana (Iguana delicatissima), a keystone species of Caribbean coastal ecosystems, has become critically endangered mainly due to ongoing hybridization with the invasive Common Green Iguana (I. iguana). For impactful conservation intervention, the need for early detection of invasive animals and their progeny, or detection of surviving pure native animals, is urgent. We aimed to develop a novel environmental DNA (eDNA) toolkit using Kompetitive Allele Specific PCR (KASP) technology, a method of allele-specific amplification for cost-effective and efficient sampling of terrestrial substrates to aid in mapping the distribution of native I. delicatissima, invasive I. iguana, and signal potential invasive hybridization. We demonstrate proof-of-concept and successfully identified I. delicatissima, I. iguana, and their hybrids via blood samples using our primer sets, as well as successful detection of I. delicatissima in several ex-situ (Rotterdam Zoo) and in-situ (St. Eustatius) eDNA samples, collected with environmental swabs and tape-lifting. We found that sampling potential perching spots yielded the highest number of positive detections via environmental swabbing and tape-lifting. Our toolkit demonstrates the potential of terrestrial eDNA sampling for iguana conservation, enabling faster detection of potential invasive hybridization. Additionally, the method holds promise for other terrestrial cryptic species, contributing to broader collection of population-level information.
Jecha, K.; Lavanchy, G.; Schwander, T.
Show abstract
Advancements in genetic technologies have allowed us to generate large data sets relatively quickly and easily. However, without proper quality control checks, the inferences drawn from such data can be erroneous and go on to misinform further studies. DNA contamination between focal samples of the same or closely related species can have major impacts on downstream analyses, but their presence is seldom tested. Here, we created a pipeline combining competitive mapping to remove reads from intergeneric contamination, followed by a filtering method using allelic depth ratio frequencies to exclude intrageneric contamination. We then used a RADseq dataset of over 1,000 Swiss Lasius ants that were cross contaminated to various levels prior to sequencing to assess the impact of contamination on inferences of introgression. The original dataset presented widespread introgression between species in which hybridization has never been recorded. After thorough decontamination, we found only one individual with a strong signature of introgression, between the species L. emarginatus and L. platythorax, revealing that introgression is extremely rare in this genus. Implementing our method of filtering can significantly improve the robustness of biological findings based on genomic datasets. We recommend that systematically checking for the presence of cross contamination should be a key step in the preprocessing of genomic datasets.
Melendez, D.; Sapci, A. O. B.; Bafna, V.; Mirarab, S.
Show abstract
Ultraconserved elements (UCEs) provide ideal candidates for targeted sequencing and cost-effective acquisition of genome-wide data. While UCEs have been widely used in phylogenetic studies to recon-struct evolutionary relationships, their use in population-level research has been limited. This limited application stems from uncertainty over whether UCEs can capture the levels of genetic variation needed to answer population genomic questions central to ecology and biodiversity research. The concern is that, by definition, UCEs are highly conserved and may therefore lack sufficient within-species variation. The more variable flanking regions (400-750 bp from the UCE core) contain informative polymorphisms, though diversity decreases near the core. Thus, any naive estimator of genetic diversity that ignores this conservation will have an underestimation bias. In this paper, we introduce SPrUCE: Sigmoid Pi requiring UCEs, a reference-free method that estimates nucleotide diversity{pi} from aligned UCE data. SPrUCE corrects underestimation bias by modeling the change in diversity away from the UCE core using a Gompertz function. The model accounts for the bias introduced by the conserved core and allows for more accurate per-site diversity estimates. We tested SPrUCE on UCE alignments from a range of taxa, including invertebrates and vertebrates (finches, honeybees, sheep, and smelt). SPrUCE produces diversity values consistent with whole-genome derived estimates that require an assembled reference. It is fast, scalable, and effective even with missing data. Its modeling approach enables accurate population-level assessments of genetic diversity, offering a new and reliable option for conservation and population genetics.
Urban, L.; Miller, A. K.; Eason, D.; Vercoe, D.; Shaffer, M.; Wilkinson, S. P.; Jeunen, G.-J.; Gemmell, N. J.; Digby, A.
Show abstract
We used non-invasive real-time genomic approaches to monitor one of the last surviving populations of the critically endangered k[a]k[a]p[o] (Strigops habroptilus). We first established an environmental DNA metabarcoding protocol to identify the distribution of k[a]k[a]p[o] and other vertebrate species in a highly localized manner using soil samples. Harnessing real-time nanopore sequencing and the high-quality k[a]k[a]p[o] reference genome, we then extracted species-specific DNA from soil. We combined long read-based haplotype phasing with known individual genomic variation in the k[a]k[a]p[o] population to identify the presence of individuals, and confirmed these genomically informed predictions through detailed metadata on k[a]k[a]p[o] distributions. This study shows that individual identification is feasible through nanopore sequencing of environmental DNA, with important implications for future efforts in the application of genomics to the conservation of rare species, potentially expanding the application of real-time environmental DNA research from monitoring species distribution to inferring fitness parameters such as genomic diversity and inbreeding.
van Berkel, D.; Breve, N.; de Boer, M.; Reynaud, E.; Nijland, R.
Show abstract
Molecular techniques involving environmental DNA (eDNA) are increasingly used for aquatic species detection. Metabarcoding, a widely adapted technique, suffers from primer bias: uneven amplification of species due to primer mismatches. The primer bias can be eliminated by omitting PCR, thereby sequencing all eDNA in a sample. This method, known as metagenomics, offers potential benefits for relative abundance estimates and epigenetic modifications, but is seldom applied to eukaryotic communities and eDNA. This study uses an expanded two-by-two design to compare fish species detection between multi-marker metabarcoding and metagenomics using two filter types (conventional versus high-flow). Environmental DNA was collected in a controlled setup and two field settings, which contained several fish species including European sturgeon (Acipenser sturio). Moreover, we explore methylation patterns obtained from nanopore native sequencing. All species present in the controlled environment were detected using both metabarcoding and metagenomics. In field settings, metagenomics detected more species than metabarcoding. High-flow filters recovered more species across all sequencing datasets, except in metabarcoding of field settings. Relative read counts between metabarcoding and metagenomics illustrate primer bias is present in the used primer sets. Most fish metagenomic sequences were identified as A. sturio across all eDNA samples. We observed three base modifications on the 18S region of A. sturio, where three sites showed different methylation patterns between eDNA samples. Our results demonstrate that metabarcoding and metagenomics function complementary in species detection and metagenomics provides additional insights into base modifications. Moreover, high-flow filters offer strong potential for improved species detection in various environments.
Hong, A.; Cheek, R. G.; Mukherjee, K.; Yooseph, I.; Oliva, M.; Heim, M.; Funk, W. C.; Tallmon, D.; Boucher, C.
Show abstract
O_LIThe genetic effective size (Ne) is arguably one of the most important characteristics of a population as it impacts the rate of loss of genetic diversity. Genetic estimators of (Ne) increasingly popular tools in population and conservation genetic studies. Yet there are very few methods that can estimate the Ne from data from a single population and without extensive information about the genetics of the population, such as a linkage map, or a reference genome of the species of interest. C_LIO_LIWe present ONeSAMP 3.0, an algorithm for estimating Ne from single nucleotide polymorphism (SNP) data collected from a single population sample using Approximate Bayesian Computation and local linear regression. C_LIO_LIWe demonstrate the utility of this approach using simulated Wright-Fisher populations, and empirical data from five endangered Channel Island fox (Urocyon littoralis) populations to evaluate the performance of ONeSAMP 3.0 compared to a commonly used Ne estimator. Our results show that ONeSAMP 3.0 is robust to the number of individual samples and number of loci included in and appears accurate even if the range of true Ne values is large. C_LIO_LIThis method is broadly applicable to natural populations and is flexible enough that future versions could easily include summary statistics appropriate for a suite of biological and sampling conditions. ONeSAMP 3.0 is publicly available under the GNU license at https://github.com/AaronHong1024/ONeSAMP_3 and also available with Bioconda (https://bioconda.github.io/index.html). C_LI
Pavinato, V. A. C.; Wijeratne, S.; Spacht, D.; Denlinger, D. L.; Meulia, T.; Michel, A. P.
Show abstract
The sequencing of whole or partial (e.g. reduced representation) genomes are commonly employed in molecular ecology and conservation genetics studies. However, due to sequencing costs, a trade-off between the number of samples and genome coverage can hinder research for non-model organisms. Furthermore, the processing of raw sequences requires familiarity with coding and bioinformatic tools that are not always available. Here, we present a guide for isolating a set of short, SNP-containing genomic regions for use with targeted amplicon sequencing protocols. We also present a python pipeline--PypeAmplicon-- that facilitates processing of reads to individual genotypes. We demonstrate the applicability of our method by generating an informative set of amplicons for genotyping of the Antarctic midge, Belgica antarctica, an endemic dipteran species of the Antarctic Peninsula. Our pipeline analyzed raw sequences produced by a combination of high-multiplexed PCR and next-generation sequencing. A total of 38 out of 47 (81%) amplicons designed by our panel were recovered, allowing successful genotyping of 42 out of 55 (76%) targeted SNPs. The sequencing of [~]150 bp around the targeted SNPs also uncovered 80 new SNPs, which complemented our analyses. By comparing overall patterns of genetic diversity and population structure of amplicon data with the low-coverage, whole-genome re-sequencing (lcWGR) data used to isolate the informative amplicons, we were able to demonstrate that amplicon sequencing produces information and results similar to that of lcWGR. Our methods will benefit other research programs where rapid development of population genetic data is needed but yet prevented due to high expense and a lack of bioinformatic experience.
Landis, J. B.; Hufnagel, E.; Felton, J. M.; Harden, J. J.; Almeida, D.; Specht, C. D.
Show abstract
Recent advancements in next generation sequencing approaches allow for expansion of evolutionary research into the discovery of genetic patterns and processes underlying diversification across scales. The increased popularity of the Element Bioscience AVITI platform, partially due to the high sequencing accuracy and low cost of reagents, is becoming a viable alternative approach for generating massive amounts of comparative sequencing data across diverse organismal lineages. Using a data set of five accessions from the monocot genus Costus, we tested miniaturization conditions for generating robust, cost-effective libraries and made comparisons of data generated by AVITI and Illumina sequencing platforms to investigate the potential for combining data for population genomic and phylogenomic analyses. Our results show that the AVITI and Illumina data sets are highly congruent in terms of inferring overlapping SNPs, with only a small fraction picked up by only one of the two platforms. The rates of duplication in miniaturized libraries were much higher than in full volume libraries and in the Illumina libraries, resulting in missing SNPs and less sequence coverage when volumes are reduced. For all generated libraries, most downstream evolutionary analyses, including clustering algorithms (such as PCA) and phylogenetic inference, yielded similar results. However, Structure analyses were less consistent across datasets, with data from the most miniaturized libraries being assigned to the wrong clusters. The AVITI platform should be seen as a cost-effective approach for generating genomic data for comparison across taxonomic lineages, even for ongoing projects where Illumina data already exists.
Tsuji, S.; Shibata, N.; Yatsuyanagi, T.; Fuke, Y.
Show abstract
Environmental DNA (eDNA) analysis is increasingly recognised as a valuable method for assessing genetic diversity. However, its resolution and applicability are limited by the short length of sequences that can be analysed (typically < 400 bp) and high analytical costs. This study developed a practical, low-cost long-fragment eDNA analysis method using commercial full-length plasmid sequencing via a nanopore platform and evaluated its effectiveness in assessing population genetic structure. 1 L of surface water was collected from 52 sites across Hokkaido, Japan, targeting Barbatula oreas. Two mitochondrial regions (ND5 and cyt b; approximately 1,000 bp each) were species-specifically amplified, circularised, and sequenced. Library preparation took 2.5 hours, with a total cost per sample of 4,390 JPY ({approx}25.55 EUR, {approx}29.87 USD). High-quality reads were obtained from 34 samples, allowing for the reconstruction of multiple haplotypes per region through haplotype phasing. The eDNA concentration required to achieve a 50% sequencing success was within a range easily attainable for common species. Phylogenetic analysis using 62 concatenated haplotypes (1,968 bp) obtained from each sample identified two clades and multiple regional subgroups, providing higher-resolution phylogeographic information than the previous study. Furthermore, the differentiation of each clade and group was suggested to reflect geological and climatic events. These results demonstrate the feasibility and utility of long-fragment eDNA analysis for evaluating genetic diversity, and its broad application is anticipated in ecological research, conservation management, and environmental policy formulation.
Duarte, S.; Simoes, L.; Costa, F. O.
Show abstract
Animal detection through DNA present in environmental samples (eDNA) is a valuable tool for detecting rare species, that are difficult to observe and monitor. eDNA-based tools are underpinned by molecular evolutionary principles, which are key to devising tools to efficiently single out a targeted species from an environmental sample, using carefully chosen marker regions and customized primers. Here, we present a comprehensive review of the use of eDNA-based methods for the detection of targeted animal species, such as rare, endangered, or invasive species, through the analysis of 460 publications (2008-2022). Aquatic ecosystems have been the most surveyed, in particular, freshwaters (75%), and to a less extent marine (14%) and terrestrial systems (10%). Vertebrates, in particular, fish (38%), and endangered species, have been the most focused in these studies, and Cytb and COI are the most employed markers. Among invertebrates, assays have been mainly designed for Mollusca and Crustacea species (22%), in particular, to target invasive species, and COI has been the most employed marker. Targeted molecular approaches, in particular qPCR, have been the most adopted (73%), while eDNA metabarcoding has been rarely used to target single or few species (approx. 5%). However, less attention has been given in these studies to the effects of environmental factors on the amount of shed DNA, the differential amount of shed DNA among species, or the sensitivity of the markers developed, which may impact the design of the assays, particularly to warrant the required detection level and avoid false negatives and positives. The accuracy of the assays will also depend on the availability of genetic data from closely related species to assess both marker and primers specificity. In addition, eDNA-based assays developed for a particular species may have to be refined taking into account site-specific populations, as well as any intraspecific variation. Graphical Abstract O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=125 SRC="FIGDIR/small/544783v1_ufig1.gif" ALT="Figure 1"> View larger version (32K): org.highwire.dtl.DTLVardef@bfed8corg.highwire.dtl.DTLVardef@84acc5org.highwire.dtl.DTLVardef@6ad579org.highwire.dtl.DTLVardef@1e51bf9_HPS_FORMAT_FIGEXP M_FIG C_FIG
Pfenninger, M.; Schoennenbeck, P.; Schell, T.
Show abstract
Precise estimates of genome sizes are important parameters for both theoretical and practical biodiversity genomics. We present here a fast, easy-to-implement and precise method to estimate genome size from the number of bases sequenced and the mean sequence coverage. To estimate the latter, we take advantage of the fact that a precise estimation of the Poisson distribution parameter lambda is possible from truncated data, restricted to the part of the coverage distribution representing the true underlying distribution. With simulations we could show that reasonable genome size estimates can be gained even from low-coverage (10X), highly discontinuous genome drafts. Comparison of estimates from a wide range of taxa and sequencing strategies with flow-cytometry estimates of the same individuals showed a very good fit and suggested that both methods yield comparable, interchangeable results.
Tsuji, S.; Miuchi, Y.; Shibata, N.; Watanabe, K.
Show abstract
Understanding fine-scale population genetic structure is essential for biodiversity conservation and evolutionary research, but conventional phylogeographic studies often face labour and financial cost constraints. This study proposes a two-step survey strategy that integrates environmental DNA (eDNA) analysis and PCR-based genome-wide SNP genotyping, aiming to evaluate its effectiveness by comprehensively characterising the population structure of widely distributed species. As a model system, we selected the odontobutid gobies, Odontobutis obscurus and O. hikimius, which occur in western Japan. Initially, water samples were collected from 335 sites across western Japan. Subsequently, tissue sampling for SNP analysis was conducted at 49 sites representing the regional groups and species. The eDNA analysis revealed two major mitochondrial clades within O. obscurus, each comprising multiple geographically distinct groups. Subsequently, tissue sampling and SNP analysis were conducted at representative sites of each regional group and species. Nuclear genomic SNP data (661 loci) corroborated the deep divergence between the two clades of O. obscurus and, unexpectedly, indicated that O. hikimius, whose range lies at their boundary, originated through hybridisation between them. Geographic patterns of the regional groups inferred from both mitochondrial and nuclear data were largely explained by historical geological events such as mountain uplift and ancient river system dynamics, and provide unprecedentedly detailed insight into the population structuring of the focal species. This study demonstrates that the integration of eDNA and SNP analyses provides a cost-effective and scalable approach for high-resolution phylogeographic surveys.